Skip to main content

All Questions

2votes
2answers
301views

Advantage computed the wrong way?

Here is the code written by Maxim Lapan. I am reading his book (Deep Reinforcement Learning Hands-on). I have seen a line in his code which is really weird. In the accumulation of the policy gradient $...
jgauth's user avatar
1vote
1answer
257views

Once the environments are vectorized, how do I have to gather immediate experiences for the agent?

My main purpose right now is to train an agent using the A2C algorithm to solve the Atari Breakout game. So far I have succeeded to create that code with a single agent and environment. To break the ...
jgauth's user avatar

close